transformer model

Terms from Artificial Intelligence: humans at the heart of algorithms

Page numbers are for draft copy at present; they will be replaced with correct numbers when final book is formatted. Chapter numbers are correct and will not change now.

A transformer model operates on time series or sequential data using a form of attention, identifiying past states/tokens that are particualrly related to the current token and then using these in particular as part of predicting future tokens. This is like the human ability to hear a sentance such as "The cat sat in the deep orange glow of sunset and licked its fur" -- when you read the word 'licked', your mind instantly pulls out the word 'cat' as related and uses that to make sense of the current pont in the sentance. We use rich semantic structures to perform this, but transformer models use a vector simiilarity between a 'key' and 'query' for each token, where the key models the kind of thing that the input token is, and the query the kind of thing it would like to connect with.

Defined on page 332

Used on Chap. 14: pages 332, 341; Chap. 19: page 464; Chap. 22: page 540; Chap. 23: page 573; Chap. 24: page 582

Also known as transformer networks